System and Method for Dynamic Adaptive Video Streaming Using Model Predictive Control

ABSTRACT

A client video player device downloads video content from a video content delivery network as segments encoded at respective bitrates selected from distinct encoding bitrates. Bitrate adaptation logic within the client video player selects the appropriate bitrate segment in order to maximize user-perceived Quality-of-Experience. An optimization of this bitrate adaptation logic implementing model predictive control that maximizes the user-perceived Quality-of-Experience is presented.

RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Patent Application Ser. No. 62/177,904, filed Mar. 26, 2015.

GOVERNMENT INTEREST

This invention was made with government support under National Science Foundation No. ECCS0925964. The government has certain rights in this invention.

FIELD OF THE INVENTION

This invention relates to streaming of video content over the Internet, specifically relating to optimizations of client-side adaptive bitrate streaming players to maximize user quality-of-experience (QoE).

BACKGROUND OF THE INVENTION

With more and more content providers delivering video stream services over the Internet, user-perceived quality-of-experience has become an important differentiator. The quality-of-experience metric includes duration of rebuffering, startup delay, average playback bitrate, and the stability of that bitrate. There is little to no support in the network for optimizing or controlling these characteristics, forcing the client player unit to cope with the intermittent congestion, diverse bottlenecks, and other complexities of the Internet.

Modern client video players use bitrate adaptation logic in order to achieve a high quality-of-experience. Many proprietary implementations of video players have been fielded, but the first adaptive bit-rate HTTP-based streaming solution that is an international standard is Dynamic Adaptive Streaming over HTTP (DASH), also known as MPEG-DASH. The logic that performs the MPEG-DASH bit-rate adaptation within the video player unit, while currently superior to non-bit-rate adaptive players, has not been thoroughly optimized for quality-of-experience.

SUMMARY OF THE INVENTION

MPEG-DASH works by breaking the video content into a sequence of small HTTP-based file segments, each segment containing a short interval of playback time of a content that is potentially many hours in duration, such as a movie or the live broadcast of a sports event. The content is made available at a variety of different bit rates. In other words, alternative segments encoded at different bit rates covering aligned short intervals of playback time are made available. In order to seamlessly adapt to changing network conditions and provide high quality play back with fewer stalls or re-buffering events, the MPEG-DASH client selects the next segment to download and play back from the different bit rate alternatives based on either current network conditions or current playback buffer occupancy.

This invention offers an alternative method to choose the next segment from the different bit rate alternatives via model predictive control (MPC), a systematic combination of buffer occupancy and bandwidth predictions. This novel technique creates a video playback system whose performance is near optimal.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a block diagram of the adaptive video player of the present invention.

FIG. 2 shows an exemplary graph of buffer occupancy as a function of time (i.e., buffer dynamics).

FIG. 3 shows the basic algorithm of used by a video player of the present invention.

FIG. 4 shows a table showing the enumeration of possible scenarios in the FastMPC version of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

For purposes herein, a “video player” shall be defined as any device capable of streaming video from a network connection, including, for example, via WiFi, BlueTooth, a cellular data connection such as LTE, a hardwired connection or via any means of connecting to a server capable of serving video at mixed bitrates. Such devices include, but are not limited to smart televisions, projectors, video streaming devices (AppleTV, ChromeCast®, Amazon Fire Stick, Roku™, etc.), video gaming systems, smart phones, tablets and software-based video players running on generic computing devices.

For a user to perceive the client-side video player, many components are required, including a video display screen, a video display subsystem with buffering, a networking interface, a processor of some sort in order to perform the networking functions and HTTP processing, and logic to perform the bitrate adaptation method described in detail following (implemented in either an integrated circuit module and or in software on a general purpose processor). A component model of the adaptive video player is illustrated in FIG. 1.

Video player 100 makes HTTP requests 102 to an internet-based video server 101, requesting video segments 104 at a specific bitrate R. As video segments 104 are received they are placed in playback buffer 106. Buffer occupancy is determined by the difference between the rate at which video segments 104 are downloaded in to playback buffer 106 and the rate at which video segments 104 are removed from playback buffer 106 for rendering on a video display screen.

Video can be modeled as a set of consecutive video segments or chunks, V={1, 2, . . . , K}, 104, each of which contains L seconds of video and encoded with different bitrates. Thus, the total length of the video is K×L seconds. The video player can choose to download video segment k with bitrate R_(k)∈R, where R is the set of all available bitrate levels. The amount of data in segment k is then L×R_(k). The higher bitrate is selected, the higher video quality is perceived by the user. Let q(19 ):R→R+ be the function which maps selected bitrate R_(k) to video quality perceived by user q(R_(k)). The assumption is that q(·) is increasing.

The video segments are downloaded into a playback buffer, 106 as shown in FIG. 1, which contains downloaded but as yet unviewed video. Let B(t) ∈[0, B_(max)] be the buffer occupancy 108 at time t, i.e., the play time of the video remained in the buffer. The buffer size B_(max) depends on the policy of the service provider, as well as storage limitations.

FIG. 2 helps illustrate the operation of the video player. At time t_(k), the video player starts to download segment k. The downloading time depends on the selected bitrate R_(k) as well as average download speed C_(k). At time t_(k+1), when segment k is completely downloaded, the video player immediately starts to download the next segment k+1. If C_(t) denotes the bandwidth at time t, then:

$\begin{matrix} {t_{k + 1} = {t_{k} + \frac{d_{k}\left( R_{k} \right)}{C_{k}} + {\Delta \; t_{k}}}} & (1) \\ {C_{k} = {\frac{1}{t_{k + 1} - t_{k} - {\Delta \; t_{k}}}{\int_{t_{k}}^{t_{k + 1} - {\Delta \; t_{k}}}{C_{t}\ {{t}.}}}}} & (2) \end{matrix}$

The buffer occupancy B(t) evolves as the chunks are being downloaded and the video is being played. Specifically, the buffer occupancy increases by L seconds after chunk k is downloaded and decreases as the user watches the video. Let B_(k)=B(t_(k)) denote the buffer occupancy when the player starts to download chunk k. The buffer dynamics can then be formulated as:

$\begin{matrix} {B_{k + 1} = \left( {\left( {B_{k} - \frac{d_{k}\left( R_{k} \right)}{C_{k}}} \right)_{+} + L - {\Delta \; t_{k}}} \right)_{+}} & (3) \end{matrix}$

An example of buffer dynamics is shown in FIG. 3,

The determination of waiting time Δt_(k), also referred as chunk scheduling problem, is an equally interesting and important problem in improving fairness of multi-player video streaming. It is assumed that the player immediately starts to download chunk k+1 as soon as chunk k is downloaded. The one exception is when the buffer is full, at which time the player waits for the buffer to reduce to a level which allows chunk k to be appended. Formally,

$\begin{matrix} {{\Delta \; t_{k}} = \left( {\left( {B_{k} - \frac{d_{k}\left( R_{k} \right)}{C_{k}}} \right)_{+} + L - B_{\max}} \right)_{+}} & (4) \end{matrix}$

The ultimate goal of bitrate adaptation is to improve the QoE of users to achieve higher long-term user engagement. A flexible QoE model, as opposed to a fixed notion of QoE is therefore used. While users may differ in their specific QoE functions, the key elements of video QoE are enumerated as:

Average Video Quality—The average per-chunk quality over all chunks:

$\frac{1}{K}{\sum_{k = 1}^{K}{q\left( R_{k} \right)}}$

Average Quality Variations—This tracks the magnitude of the changes in the quality from one chunk to another:

$\frac{1}{K}{\sum_{k = 1}^{K - 1}{{{q\left( R_{k + 1} \right)} - {q\left( R_{k} \right)}}}}$

Total rebuffer Time—For each chunk k rebuffering occurs if the download time d_(k)(R_(k))/C_(k) is higher than the playback buffer level when the chunk download started (i.e., B_(k)). Thus the total rebuffer time is:

$\sum_{k = 1}^{K}\left( {\frac{d_{k}\left( R_{k} \right)}{C_{k}} - B_{k}} \right)_{+}$

Alternatively, the number of rebufferings could be used in lieu of total rebuffer time:

$\sum_{k = 1}^{K}{1\left( {\frac{d_{k}\left( R_{k} \right)}{C_{k}} > B_{k}} \right)}$

Lastly, Startup Delay T_(s), assuming T_(s)<<B_(max).

As users may have different preferences on which of four components is more important to them, the QoE of video segment 1 through K is defined by a weighted sum of the aforementioned components:

$\begin{matrix} {{QoE}_{1}^{K} = {{\sum\limits_{k = 1}^{K}\; {q\left( R_{k} \right)}} - {\lambda {\sum\limits_{k = 1}^{K - 1}\; {{{q\left( R_{k + 1} \right)} - {q\left( R_{k} \right)}}}}} - {\mu {\sum\limits_{k = 1}^{K}\; \left( {\frac{d_{k}\left( R_{k} \right)}{C_{k}} - B_{k}} \right)_{+}}} - {\mu_{s}T_{s}}}} & (5) \end{matrix}$

Here λ, μ and μ_(s) are non-negative weighing parameters corresponding to video quality variations and rebuffering time, respectively. A relatively small λ indicates that the user is not particularly concerned about video quality variability; the large λ is, the more effort is made to achieve smoother changes of bitrates. A large μ, relative to the other parameters, indicates that a user is deeply concerned about rebuffering. In cases where users prefer low startup delay, a large μ_(s) is employed

This definition of QoE is very general and allows customization so it can easily take into account user's preference, and could be extended as needed to incorporate other factors. As can be seen if FIG. 1, the QoE preferences 120 of the user is one of the factors used by bitrate controller 116 to determine the bitrate 118 of subsequent requests 102 for video chunks.

The problem of bitrate adaptation for QoE maximization can therefore be formulated in the following way:

$\begin{matrix} {\max\limits_{R_{1},\ldots \mspace{14mu},R_{K},T_{s}}\; {QoE}_{1}^{K}} & (6) \\ {{{s.t.\mspace{14mu} t_{k + 1}} = {t_{k} + \frac{d_{k}\left( R_{k} \right)}{C_{k}} + {\Delta \; t_{k}}}},} & (7) \\ {{C_{k} = {\frac{1}{t_{k + 1} - t_{k} - {\Delta \; t_{k}}}{\int_{t_{k}}^{t_{k + 1} - {\Delta \; t_{k}}}{C_{t}{t}}}}},} & (8) \\ {{B_{k + 1} = \left( {\left( {B_{k} - \frac{d_{k}\left( R_{k} \right)}{C_{k}}} \right)_{+} + L - {\Delta \; t_{k}}} \right)_{+}},} & (9) \\ {{B_{1} = T_{s}},{B_{k} \in \left\lbrack {0,B_{\max}} \right\rbrack}} & (10) \\ {{R_{k} \in },{{\forall k} = 1},\ldots \mspace{14mu},{K.}} & (11) \end{matrix}$

This can be denoted as QoE_MAX₁ ^(K).

The bandwidth trace C_(t), t∈[t₁, t_(K+1)] serves as input to the problem. The outputs of QoE_MAX₁ ^(K) are bitrate decisions bitrate decisions R₁, . . . , R_(K), and startup time T_(S).

Note that the problem QoE_MAX₁ ^(K) is formulated assuming the video playback has not started at the time of this optimization so the start-up delay T_(S) is a decision variable. However, this QoE maximization can also take place during video playback at time t_(k) ₀ when the next chunk to download is k₀ and the current buffer occupancy is B_(k) ₀ . In this case, the variable T_(s) can be dropped and the corresponding steady state problem denoted as QOE_MAX_STEADY^(K) _(k) ₀ .

A source of randomness is the bandwidth C_(t): At time t_(k) when the video player chooses bitrate R_(k), only the past bandwidth {C_(t), t≦tk} is available while the future values {C_(t), t>t_(k)} are not known. However a throughput predictor 110 can be used to obtain predictions for future available bandwidth 114 based on past throughput 112, defined as {Ĉ_(t), t>t_(k)}. Based on such predictions 114, and on buffer occupancy information 108 (which is instead known precisely) and the QoE preferences 120 of the user, the bitrate controller 116 selects bitrate 118 of the next segment k:

R _(k) =f(B _(k) , {Ĉ _(t) , t>t _(k) }, {R _(i) , i<k}).  (12)

Note that the basic MPC algorithms assume the existence of an accurate throughput predictor. However, in certain severe net work conditions, e.g., in cellular networks or in prime time when the Internet is congested, such accurate predictors may not be available. For example, if the predictor consistently overestimates the throughput, it may induce high rebuffering. To counteract the prediction error, a robust MPG algorithm is presented. Robust MPC optimizes the worst-case QoE assuming that the actual throughput can take any value in a range [̂Ct, ̂Ct] in contrast to a point estimate ̂Ct. Robust MPC entails solving the following optimization problem at time t_(k) to get bitrate R_(k):

$R_{k} = {{f_{robustmpc}\left( {R_{k - 1},B_{k},\left\lbrack {\underset{\_}{{\hat{C}}_{t}},\overset{\_}{{\hat{C}}_{t}}} \right\rbrack} \right)}\text{:}\mspace{14mu} \underset{R_{k},\ldots \mspace{14mu},R_{k + N +}}{\max \;}{\min\limits_{C_{t} \in {\lbrack{\underset{\_}{{\hat{C}}_{t}},\overset{\_}{{\hat{C}}_{t}}}\rbrack}}{QoE}_{k}^{k + N - 1}}}$

subject to the constraints in paragraph [0028].

In general, it may be non-trivial to solve such a max-min robust optimization problem. In this case, however, the worst case scenario takes place when the throughput is at its lower bound Ct=̂Ct. Thus, the implementation of robust MPC is straightforward. Instead of ̂Ct, the lowest possible ̂Ct is used as the input to the MPC QoE maximization problem.

To verify the inventions improved QoE over current methods, a normalized QoE metric was defined to compare performance of available video playback systems. These systems, along with the invention, were compared to the optimal possible performance, that which could be achieved if the future bandwidth of the network was known.

For a given bandwidth trace {C, t ∈[t, t_(K+1)]}, the offline optimal QoE, denoted by QoE(OPT), is the maximum QoE that can be achieved with perfect knowledge of future bandwidth over the entire time horizon.

Technically, it is calculated by solving problem QoE_MAX₁ ^(K). While the assumption of knowing the entire future is not true in reality, the offline solution provides a theoretical upper bound for all systems for a particular bandwidth trace.

On the other hand, online QoE with bitrate selection system A is calculated under the assumption that at time t_(k), the bitrate controller only knows the past bandwidth {Ct, t∈[t₁, t_(k)]. Based on this, R_(k) (i.e., the bitrate 118 for the next video segment) is selected. The online QoE achieved by algorithm A can be denoted by QoE(A).

Because offline optimal solution assumes perfect knowledge about the future, for any video playback system the online QoE is always less than the offline optimal QoE. In other words, QoE(OPT) is an upper bound of online QoE achieved by any video playback system. To this end, QoE of A (n-QoE(A)) is defined as the performance metric for an system A:

${n\text{-}{{QoE}(A)}} = \frac{{QoE}(A)}{{QoE}({OPT})}$

FIG. 2 shows a high-level overview of the workflow of the MPC algorithm for bitrate adaptation. The algorithm essentially chooses bitrate R_(k) by looking N steps ahead (i.e., the moving horizon), and solves a specific QoE maximization problem (this depends on whether the player is in steady or startup phase) with throughput predictions {Ĉ_(t), t∈[t_(k),t_(k+N)]}, or Ĉ_([t) _(k) _(,t) _(k+N]) . The first bitrate R_(k) is applied by using feedback information and the optimization process is iterated at each step k.

At iteration k, the player maintains a moving horizon from chunk k to k+N−1 and carries out the following three key steps, as shown in Algorithm 1.

1. Predict: Predict throughput Ĉ_([t) _(k) _(,t) _(k+N]) for the next N chunks using some throughput predictor. The actual prediction mechanism relies on existing approaches. Improving the accuracy of this prediction will improve the gains achieved via MPC. That said, MPC can be extended to be robust to errors as we discuss below.

2. Optimize: This is the core of the MPC algorithm: Given the current buffer occupancy B_(k), previous bitrate R_(k−1) and throughput prediction Ĉ_([t) _(k) _(,t) _(k+N]) , find optimal bitrate R_(k). In steadystate, R_(k)=f_(mpc) R_(k−1),B_(k),C_([t) _(k) _(,t) _(k+N]) , implemented by solving

QOE_MAX_STEADY_(k) ^(k+N−1)

In the start-up phase, it also optimizes start-up time T_(S) as:

[R _(k) , T _(s) ]=f _(mpc) ^(st)(R _(k−1) , B _(k) , Ĉ _([t) _(k) _(,t) _(k+N]) )

implemented by solving

QOE_MAX _(k) ^(k+N−1)

If practical details about computational overhead, are ignored, off-the-shelf solvers such as CPLEX can be used to solve these discrete optimization problems.

3. Apply: Start to download chunk k with R_(k) and move the horizon forward. If the player is in start-up phase, wait for T_(s) before starting playback.

This workflow has several qualitative advantages compared with buffer-based (BB), rate-based (RB). First, the MPC algorithm uses both throughput prediction and buffer information in a principled way. Second, compared to pure RB approaches, MPC smooths out prediction error at each step and is more robust to prediction errors. Specifically, by optimizing several chunks over a moving horizon, large prediction errors for one particular chunk will have lower impact on the performance. Third, MPC directly optimizes a formally defined QoE objective, while in RB and BB the tradeoff between different QoE factors is not clearly defined and therefore can only be addressed in an ad hoc qualitative manner.

Experimentation using this invention over a wide variety of network conditions have shown a higher normalized QoE compared to existing video playback systems.

Lastly, as opposed to rate-based and buffer-based algorithms, which need relatively minor computations, the challenge with MPC is that a discrete optimization problem needs to be solved at each time step. There are two practical concerns here.

(1) Computational overhead: The high computational overhead of MPC is especially problematic for low-end mobile devices, which are projected to be the dominant video consumers going forward. Since the bitrate adaptation decision logic is called before the player starts to download each chunk, excessive delay in the bitrate adaptation logic will negatively affect the QoE of the player.

(2) Deployment: Because there is no closed-form or combinatorial solution for the QoE maximization problem, a solver (e.g., CPLEX or Gurobi) will need to be used. However, it may not be possible for video players to be bundled with such solver capabilities; e.g., licensing issues may preclude distributing such software or it may require additional plugin or software installations which poses significant barriers to adoption.

Therefore, it is evident that the solution should be lightweight and combinatorial (i.e., not solving a LP or ILP online). As such, also presented herein is a fast and low-overhead FastMPC design that does not require any explicit solver capabilities in the video player.

At a high level, FastMPC algorithms essentially follow a table enumeration approach. Here, an offline step of enumerating the state-space and solving each specific instance is performed. Then, in the online step, these stored optimal control decisions mapped to the current operation conditions are used. That is, the algorithm will be reduced to a simple table lookup indexed by the key value closest to the current state and the output of the lookup is the optimal solution for the selected configuration.

As shown in FIG. 4, the state-space is determined by the following dimensions: (1) current buffer level, (2) previous bitrates chosen, and (3) the predicted throughput for the next N chunks (i.e., the planning horizon). Thus, FastMPC will entail enumerating potential scenarios capturing different values for each dimension and solving the optimization problems offline.

Unfortunately, directly using this idea will be very inefficient because of the high dimensional state space. For instance, if there are 100 possible values for the buffer level, 10 possible bitrates, a horizon of size 5, and 1000 possible throughput values, there will be 10¹⁸ rows in the table. There are two obvious consequences of this large state space. First, it may not be practical to explicitly store the full table in the memory, causing any implementation to have a very high memory footprint along with a large startup delay, as the table will need to be downloaded to the player module. Second, it will incur a non-trivial offline computation cost that may need to be rerun as the operating conditions change.

There are two key optimizations that will make FastMPC practical.

Compaction via binning: To address the offline exploration cost, it should be realized that very fine-grained values for the buffer and the throughput levels may not be needed. As a consequence, these values may be suitably coarsened into aggregate bins. Moreover, with binning, row keys do not need to be explicitly stored the as these are directly computed from the bin row indices. The challenge is to balance the granularity of binning and the loss of optimality in practice. In practice, using approximately 100 bins for buffer level and 100 bins for throughput predictions works well and yields near-optimal performance.

Table compression: The decision table learned by the offline computation has significant structure. Specifically, the optimal solutions for several similar scenarios will likely be the same. Thus, this can be exploited this structure in conjunction with the binning strategy to explore a simple lossless compression strategy using a run-length encoding to store the decision vector. The optimal decision can then be retrieved online using binary search. In practice, with compression, the table occupies less than 60 kB with 100 bins for buffer levels, 100 bins for throughput predictions and 5 bitrate levels.

The invention may be implemented in any video player 100, as defined herein, as, for example, a built-in feature, an add-on, a downloadable app, a piece of software, etc., or in any other way of implementation, currently known or yet to be developed.

Although the invention is illustrated and described herein with reference to specific embodiments, the invention is not intended to be limiting to the details shown. Rather, various modifications may be made in the details without departing from the invention. 

We claim:
 1. A streaming video playback device comprising: a video display screen; a video display subsystem with a playback buffer; a network interface; and a processor, said processor running software performing the functions of: requesting and downloading video segments from a video server at a specific bitrate; storing downloaded video segments in said playback buffer; removing video segments from said playback buffer and rendering said video segments on said video display screen; wherein the bitrate at which succeeding video segments are requested is determined using model predictive control based upon expressed preferences of a user of said device.
 2. The device of claim 1 wherein said model predictive control is used to maximize the quality of experience of said user.
 3. The device of claim 2 wherein said quality of experience of said user is expressed as a set of weighted factors based on said expressed preferences of said user.
 4. The device of claim 3 wherein said weighted factors include average video quality, average quality variations, total rebuffer time or number of rebufferings and start-up delay.
 5. The device of claim 3 wherein said software performs the further function of predicting the likely throughput for the next several downloads of video segments.
 6. The device of claim 5 wherein said predictions of likely throughput are based on the throughput for the last several video segment downloads.
 7. The device of claim 5 wherein said bitrate selection for each subsequent download is selected to maximize said quality of experience as a function of said predicted throughput.
 8. The device of claim 7 wherein said quality of experience is also a function of the current occupancy of said playback buffer.
 9. The device of claim 8 wherein said quality of experience is also a function of playback buffer dynamics.
 10. The device of claim 4 wherein said average video quality is expressed by the function $\frac{1}{K}{\sum_{k = 1}^{K}{{q\left( R_{k} \right)}.}}$
 11. The device of claim 10 wherein said average quality variations are expressed by the function $\frac{1}{K}{\sum_{k = 1}^{K - 1}{{{q\left( R_{k + 1} \right)} - {q\left( R_{k} \right)}}}}$
 12. The device of claim 11 wherein said total rebuffer time is expressed by the function $\sum_{k = 1}^{K}\left( {\frac{d_{k}\left( R_{k} \right)}{C_{k}} - B_{k}} \right)_{+}$
 13. The device of claim 12 wherein said quality of experience is expressed by the function: ${QoE}_{1}^{K} = {{\sum\limits_{k = 1}^{K}\; {q\left( R_{k} \right)}} - {\lambda {\sum\limits_{k = 1}^{K - 1}\; {{{q\left( R_{k + 1} \right)} - {q\left( R_{k} \right)}}}}} - {\mu {\sum\limits_{k = 1}^{K}\; \left( {\frac{d_{k}\left( R_{k} \right)}{C_{k}} - B_{k}} \right)_{+}}} - {\mu_{s}T_{s}}}$
 14. A streaming video playback device comprising: a video display screen; a video display subsystem with a playback buffer; a network interface; and a processor, said processor running software, said software consisting of: an HTTP module, for requesting video segments from a video server at a specific bitrate and for storing received video segments in said playback buffer; a throughput predictor module, for predicting future likely throughput based on throughput realized from preceding downloads of video segments; and a bitrate controller, for determining the said specific bitrate at which the next video segment should be requested, based on said throughput predictions and playback buffer dynamics; wherein said specific bitrate at which succeeding video segments are requested is determined using model predictive control based upon expressed preferences of a user of said device.
 15. The device of claim 14 wherein said model predictive control is used to maximize the quality of experience of said user.
 16. The device of claim 15 wherein said quality of experience of said user is expressed as a set of weighted factors based on said expressed preferences of said user. 